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ABSTRACT 


An  approach  to  feature  extraction  based  on  functions  of  the  class  correlation 
matrices  is  described.  If  linear  functions  of  the  correlation  matrices  are  chosen, 
the  present  method  extends  the  methods  of  feature  extraction  proposed  by  Fukunaga 
and  Koontz.  If  certain  types  of  non-linear  functions  are  employed,  the  method  re- 
duces to  the  orthogonal  subspace  method  of  Watanabe  and  Pakvasa. 

Optimization  of  selected  features  through  selection  of  appropriate  functions  is 
discussed  briefly.  Preliminary  results  of  classification  of  radar  signatures  using 
the  feature  extraction  methods  described  here  are  presented. 
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I.  INTRODUCTION 


The  goal  of  feature  extraction  in  pattern  recognition  is  to  reduce  the  dimension- 
ality of  the  space  in  which  classes  of  data  are  represented  without  greatly  reducing 
the  separability  of  the  classes.  An  approach  to  linear  methods  of  feature  extraction 
is  described  which  is  based  on  applying  certain  functions  to  the  correlation  matrices 

of  the  classes  to  be  separated.  This  approach  to  feature  extraction  was  motivated  by 

r i| 

experience  with  two  other  methods -that  of  Fukunaga  and  Koontz1  and  that  of  Watanabe 

[2] 

and  Pakvasa.  The  present  report  shows  a relation  between  these  two  methods,  and 
provides  a natural  extension  of  the  Fukunaga- Koontz  method  to  the  multiclass  case. 

In  addition,  the  present  formulation  provides  enough  flexibility  to  in  principle  optimize 
class  separability  in  a very  general  way.  This  point  is  discussed  in  the  report. 

II.  FORMULATION  OF  LINEAR  FEATURE  EXTRACTION 

Consider  the  problem  of  generating  features  to  classify  patterns  into  one  of  K 
distinct  classes.  The  patterns  are  originally  represented  by  vectors _x  in  an  n- dimen- 
sional linear  vector  space  (the  "observation  space").  The  correlation  matrix  for  each 
class  is  defined  by 

R^=E^^xx^J  = k = 1,  2 K (1) 

where  E denotes  expectation  carried  out  using  the  probability  density  p of  class  k. 

K K 

It  is  assumed  that  the  correlation  matrices  satisfy  the  condition 

II  II  3 Xm*  1;  k = 1,  2 K (2) 

k max 

where  X is  the  largest  eigenvalue  of  R . (This  results  in  no  loss  of  generality 
max  x 

since  (2)  can  always  be  achieved  by  a linear  scaling  of  the  observation  space. ) Thus, 
since  the  correlation  matrix  is  positive  definite,  all  of  the  eigenvalues  of  R^  lie  be- 
tween 0 and  1 . 
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In  order  to  motivate  the  general  approach,  let  |u^,  j = 1,  2,  . . . , n j-be  an  or- 
thonormal basis  for  the  observation  space.  Further,  let  x be  any  random  vector  and 
let  x be  a truncated  expansion  of  x using  m < n of  the  u .. 


A 

X 


= y b.  u . 

L , J -J 
J = 1 


(3) 


th 

A suitable  set  of  features  for  the  i class  would  result  if  one  could  choose  the 


u such  that  the  mean- square  error 


Ei  [ I * ' * I 2 ] = ^ I - ' * I 2 P[  d* 


(4) 


is  minimum  and  simultaneously  the  mean-square  error 

Ek  [[  |x~xt2]  = ^|x“^|2Pk<2S>d2S  k = 1,  2,  . . . , K (5) 

is  maximum.  Minimizing  (4)  without  conditions  (5)  leads  to  the  well-known  Karhunen- 
\ [ 1 1 

Loeve  expansion  - -an  optimum  representation  of  a vector  of  class  i with  m terms. 
The  additional  conditions  (5)  however,  if  satisfied,  would  insure  that  the  basis  chosen 
to  optimally  represent  a vector  as  a member  of  class  i would  simultaneously  be  non- 
optimal  for  representing  it  as  a member  of  the  other  classes. 


Since  it  is  usually  not  possible  to  minimize  (4)  and  maximize  (5)  simultaneously, 
a related  criterion  will  be  derived.  This  leads  to  a generalization  of  the  Karhunen- 
Loeve  expansion  that  applies  to  problems  where  class  separability  must  be  preserved. 

Note  first  that  the  mean-square  error  in  representation  can  be  expressed  as* 

n 

Ek[l^‘^|2]  = 7 “Tj  Rk  “j  k = 1,  2,  . . . , K (6) 

j = m+1 

* Although  this  result  is  well  known,  a proof  is  given  in  the  Appendix  for  conven- 
ience. 
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Then  by  virtue  of  (2)  and  the  positive  definite  property  of  R^,  (6)  is  bounded  by 

n-msE^j^|x-x|^JsO  k = 1,  2,  . . . , K (7) 

As  a result,  maximizing  (5)  for  k ^ i is  equivalent  to  minimizing 


(n  - m)  - E.  I"  I x - x | 1 = Y (u  T u . - u Y R u .)  = Y u T (I  - R. ) u (8) 

k L ' — — 1 J Z — j — j — j k-j  L -j  k -j 

j = m+1  j = m+1 

A single  combined  criterion  is  taken  therefore  as  the  sum  of  the  criteria  (4)  and  (8) 
normalized  by  K,  the  number  of  classes,  that  is 


Ct=7  {Fi[li-2|2]+ 


(9) 


k = 1 

k^i 


where  C.  is  to  be  minimized.  If  (6)  and  (8)  are  substituted  into  (9)  then  C.  can  be  ex- 


pressed as 


where 


r1  T A 

C.  = > u.  Gj  u. 

l L — i 1 -i 

j = m+1 

4[vl«-«y  ] 

k = i 

k f i 


(10) 


(11) 


A 


The  vectors  u ^ that  minimize  (10)  are  the  eigenvectors  of  corresponding  to  the 
n - m smallest  eigenvalues.  * Since  (10)  is  to  be  minimized  for  any  m < n,  the  opti- 
mal basis  { u.  "V  is  the  set  of  eigenvectors  of  G.,  and  the  eigenvectors  chosen  to  express 

A ^ J 1 A 

x should  be  those  corresponding  to  the  m largest  eigenvalues  of  G^. 


Op.  cit. , p.  2. 
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Note  that  if  e is  a normalized  eigenvector  of  G^,  then  the  corresponding  eigen- 
value p.  can  be  expressed  as 

A K 

ta  i r t V-1  t 1 

P = e G.e=Y[_£  e + ^ (i  - e Rfc  e ) J (12) 

k = 1 

Mi 

Equation  ( 2 ) and  the  positive  definite  property  imply  that  each  of  the  quadratic  pro- 
ducts in  (12)  has  a value  between  0 and  1.  Thus  p lies  between  0 and  1 and  is  close  to 
T T 

1 only  if  e R.e  is  close  to  1 and  all  of  the  e R e (k  -f  i)  are  simultaneously  close  to 
1 A k 

0.  Thus  the  eigenvectors  of  G.  corresponding  to  eigenvalues  near  1 relate  to  impor- 
tant distinguishing  features  of  class  i. 


This  approach  can  be  generalized  as  follows.  If  A is  a real  symmetric  matrix, 
then  the  matrix  function  f (A)  for  any  scalar  function  f can  be  defined  as 


rf<V 


f (A)  = V 


f(*2) 


V' 


f<V 


(13) 


where  X are  the  eigenvalues  of  A,  and  V is  the  orthonormal  transformation  that  diag- 
onalizes A.  * Since  the  columns  of  V are  the  eigenvectors  of  A,  the  function  f serves 
to  "weight"  the  eigenvalues  of  A without  changing  its  eigenvectors. 

The  foregoing  concept  can  be  applied  to  feature  extraction.  Define  the  matrices 
Gi  and  Hj  for  i = 1,  2,  . . . , K by 


For  purposes  of  this  report  (13)  is  taken  to  be  the  definition  of  a function  of  a 
symmetric  matrix.  This  definition  does  not  make  any  assumptions  of  analyticity  on 
the  function  f which  are  required  for  the  extension  of  the  matrix  function  concept  to 
more  general  matrices. 
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(14a) 


G;  = 


K 

i[tl<Rl)+£(l-fk(Rk))] 

k = 1 

M i 


H.  = h{G{) 


(14b) 


where  the  functions  | j-  and  h are  any  functions  mapping  the  interval  [ 0,  1 ] into 
[0,  1 ] . We  refer  to  the  functions  |f^  ]■  and  h as  the  "preweighting"  functions  and  the 
"postweighting"  function,  respectively.  Features  are  defined  in  terms  of  the  post- 


weighted  matrices  hk  by  one  of  two  methods: 


Method  1 - Features  are  chosen  as  the  projection  of  the  data 
along  selected  eigenvectors  of  the  matrices  Gj.  The  post- 
weighting function  h can  serve  to  select  the  appropriate  eigen- 
vectors, that  is  h (\)  is  1 for  a selected  eigendirection  and  0 
for  an  eigendirection  that  is  not  selected.  * 


Method  2 - Features  are  defined  by  the  relation 
T 

Z:  = x H.  x 

t — i — 

Each  of  the  features  i = 1,  2 K can  be  thought  of  as  a weighted 

projection  of  the  observation  vector  x into  a subspace  of  the  observation 
space. 

III.  THE  TWO-CLASS  CASE 

For  the  special  case  of  K = 2 the  matrices  Gj  and  G2  defined  by  (14a)  satisfy  the 
relation 

G2  = I - Gi  (15) 

* Since  there  is  no  guarantee  that  the  selected  eigenvectors  from  different  G; 
will  be  independent,  it  may  be  necessary  to  eliminate  those  eigenvectors  that  can  be 
represented  as  linear  combinations  of  the  others. 
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Therefore  and  have  identical  eigenvectors  and  their  eigenvalues  are  related  by 

X/2>  = 1 ’ j =1.  2,  ....  n (16) 

(k) 

Since  the  X ' ' all  lie  in  the  interval  [0,  1 ] , (16)  shows  that  the  eigenvectors  of  G^ 
that  provide  the  "most  important"  features  for  class  1 provide  the  "least  important" 
features  for  class  2 and  vice-versa.  This  is  the  principle  upon  which  the  method  of 
Fukunaga  and  Koontz  is  based  (see  Section  IV). 


IV.  LIN  EAR  W EIGHTING 

A simple  form  of  pre-weighting  function  is  a linear  function 

fk(Rk)=akRk  (0<ak<l)  k = 1,  2,  . . . , K (17) 


When  this  form  of  preweighting  is  used  in  the  two-class  case,  the  results  can  be  re- 
lated to  the  Fukunaga -Koontz  method  of  feature  extraction. 

For  K = 2 (14a)  becomes 


G,  = 1/2(a1R1  -a2R2+I) 
G2  = !/2<a  R -a^  +1) 


(18) 


Fukunaga  and  Koontz  first  perform  a linear  transformation  of  the  observation  space 
which  forces  the  correlation  matrices  R^'  and  R^'  in  the  transformed  space  to  satisfy 

alRl'  +a2R2'  = 1 <19> 

The  transformed  correlation  matrices  automatically  satisfy  (2).  Under  this  condition 
(18)  reduces  to 


Gi=aiV 


G2  a2R2 


(20) 


and  the  two  methods  become  identical. 
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When  the  two  classes  have  different  mean  vectors  m,  and  m_  bat  equal  covari- 

— 1 — 2 

ance  matrices,  and  are  weighted  equally  (a^  = a2  = a)>  (18)  becomes 
G = 1/2  ( a (m  m T-  m m T)  + 1 ) 

\ i i 2 2 / (21) 

G2  = 1/2  ( a (^2—  2T'  — l — 1^  + 1 ) 

Such  matrices  have  only  two  eigenvalues  that  are  not  equal  to  1/2,  and  only  the  cor- 
responding two  eigendirections  contribute  to  the  separation  of  the  classes,  f ^ 

Linear  weighting  is,  of  course,  applicable  to  the  multiclass  case.  Further, 
when  the  correlation  matrices  are  transformed  to  satisfy  the  condition 

K 

1 akRk'  ■ 1 <22> 

k = 1 

then  linear  weighting  becomes  an  extension  of  the  Fukunaga-Koontz  method.  The  ma- 
trices in  (14a)  assume  the  form 

Gi  =~  [2a.R.’  + (K  - 2)  I ] (23) 

Although  (16)  has  no  direct  analogy,  the  eigenvectors  of  corresponding  to  eigen- 
values that  are  close  to  1 are  "most  important"  for  representing  class  i and  simultan- 
eously "least  important"  for  representing  the  other  classes.  Therefore  these  eigen- 
vectors can  be  expected  to  produce  the  best  features  if  Method  1 is  employed. 

V.  NONLINEAR  WEIGHTING 

When  the  nonlinear  functions  shown  in  Fig.  1 (unit  step  functions)  are  used  for 
pre- weighting  the  correlation  matrices,  the  result  can  be  interpreted  in  terms  of  the 
subspace  method  of  feature  extraction  developed  by  Watanabe  and  Pakvasa. 


7 


fk  (x)  = u (x  - ak)  Oiak^l 


TFflt-59  (l) 


Fig.  1.  Unit  step  function. 


The  functions  f^  (x)  = u(x  - a^)  map  the  eigenvalues  of  the  correlation  matrices 

into  0 and  1 and  thereby  transform  the  correlation  matrices  into  so-called  orthogonal 

projection  operators.  The  projection  operator  = f^  (R^)  corresponds  to  a subspace 

S (P  ) of  the  observation  space  spanned  by  the  eigenvectors  of  Rj,  whose  eigenvalues 

are  greater  than  or  equal  to  a^.  The  projection  operator  P^  transforms  any  vector  x 

into  another  vector  x,  called  the  projection  of  x into  S (p  ).  A geometrical  interpre- 

k k 

tation  is  given  in  Fig.  2.  The  matrices  of  (14a)  are  expressed  in  terms  of  the  pro- 
jection operators  by 

K 

Gi  ' T;  [ Pi+  I(I  ■ Pk>  ] 1 = ‘.  2 K (24) 

k = 1 

Mi 
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TN74-59  (2) 


Watanabe  calls  the  subspaces  S (P^)  "representation  subspaces"  since  they  are 
spanned  by  the  eigenvectors  of  each  class  that  provide  the  optimal  mean-square  repre- 
sentation of  that  class.  Feature  subspaces  are  formed  by  removing  the  intersection  of 
the  representation  classes.  If  S (P^’ ) is  the  ith  feature  subspace  then 

s <p:  ) = s (P.)  n[  ft  Toy  j <25) 

k = 1 

Mi 

where  S (p^)  denotes  the  complement  of  S ( P j,) • 
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[3] 

It  can  be  shown  that  the  eigenvectors  of  (24)  corresponding  to  an  eigenval- 
ues of  1 span  S (P.'  ).  In  particular,  if  the  postweighting  function  h(x)  is  defined  by 

hO0  = i ifx  = i (26) 

0 otherwise 

then  the  projection  operators  for  the  feature  subspaces  are  given  by 

Pi’  =h(Gi)  i = 1,  2,  ...,  K (27) 

Features  for  the  subspace  technique  are  defined  by  Method  2,  resulting  in  one  feature 
for  each  class.  * 

VI.  REMARKS  ABOUT  OPTIMAL  WEIGHTING 

The  Fukunaga-Koontz  results  show  that  equal  linear  weighting  optimizes  the 

[ 1 1 

Divergence  measure  of  separability  in  the  two-class  case  with  equal  covariances. 

It  is  probably  very  difficult  to  analytically  determine  weighting  functions  that  would 
optimize  any  measure  of  class  separability  in  more  general  cases.  It  does  seem  fea- 
sible to  numerically  optimize  almost  any  criterion  within  certain  classes  of  parame- 
terized weighting  functions.  Both  the  linear  functions  and  step  functions  described 
here  are  suitable  choices  for  this  type  of  optimization.  Polynomial  or  piecewise-lin- 
ear  functions  could  also  be  used. 

When  using  Method  1 for  defining  features,  one  would  choose  the  postweighting 
function  to  select  eigendirections  corresponding  to  the  largest  m eigenvalues  of  each 

K 

G,  and  optimize  the  parameters  of  the  preweighting  functions.  When  using  Method  2 

lv 

for  defining  features  both  the  preweighting  and  the  postweighting  functions  must  be  op- 
timized simultaneously  and  a dynamic  programming  approach  may  be  appropriate. 


When  the  number  of  classes  is  small,  the  portion  of  the  observation  space  not 
common  to  any  of  the  feature  subspaces  may  also  be  used  to  generate  a feature. 
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VII.  APPLICATION  TO  RADAR  SIGNATURE  CLASSIFICATION 


Preliminary  results  from  the  application  of  the  weighting  function  methods  to 
the  classification  of  radar  signatures  are  reported  in  this  section.  The  data  to  be 
classified  consisted  of  300  simulated  radar  signatures  from  each  of  two  distinct  ob- 
jects (a  reentry  vehicle  and  a decoy)  in  ballistic  trajectories.  Each  signature  was 
represented  in  the  observation  space  by  a 30 -dimensional  vector  corresponding  to  a 
set  of  sequential  returns  received  by  the  radar.  The  data  were  then  mapped  into  a 
3 -dimensional  feature  space  using  the  Fukunaga-Koontz  technique,  the  linear  weight- 
ing technique,  and  the  subspace  technique.  * Features  were  chosen  according  to 
Method  1 for  the  Fukunaga-Koontz  and  linear  weighting  techniques  and  according  to 
Method  2 for  the  subspace  technique.  Fig.  3 shows  results  of  classification  in  the 
observation  space  and  in  each  of  the  three  feature  spaces.  A quadratic  classifier 
using  the  "leave-one-out"  method  was  employed  to  produce  the  operating  character- 
istics. For  a three-dimensional  feature  space,  the  Fukunaga-Koontz,  linear  weight- 
ing, and  the  subspace  methods  show  comparable  performance.  When  the  feature 
spaces  for  the  Fukunaga-Koontz  and  linear  weighting  methods  are  increased  to  twelve 
dimensions,  performance  appoaches  that  of  the  classifier  in  the  30-dimensional  ob- 
servation space.  These  examples  show  that  it  is  possible  to  considerably  reduce  the 
dimensionality  of  data  through  suitable  linear  transformations  without  greatly  reduc- 
ing the  separability. 


Equal  weighting  a^  = a^  = 1 was  used  for  the  former  two  techniques. 
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CORRECT  RV  CLASSIFICATION  (DETECTION)  PROBABILITY 


TN74-59  (3) 


DECOY  MISCLASSIFICATION  (FALSE  ALARM)  PROBABILITY 
Fig.  3.  Operating  characteristic  for  quadratic  classifier  in  feature  spaces. 
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APPENDIX 

Proof  of  Results  Relating  to  the  Optimal  Basis 


1.  Proof  of  Equation  (6) 


Given  any  orthonormal  basis  |]±j>  j = 1»  2,  n j , a vector  x in  the  observa- 

tion space  can  be  represented  by 


where 


x = \ b . u . 
L j -J 

j=* 


• T 

b . = x u . 
J J 


(A.  1) 


j = 1,  2,  . . . , n 


(A.  2) 


Let  x be  a truncated  representation  of  x given  by  (3).  Then  for  any  class  k one  can 
write 

■yjs-ii 2]-=k[(ivjT)(rYJ)]-K"»J2' 

j = m+l  j = m+1  j = m+1 

where  the  last  equality  derives  from  the  orthonormal  property  of  the  basis.  If  (A.  2) 
and  (1)  are  used  in  (A.  3),  the  latter  equation  becomes 

n n 


r i a , 2 -i 

| \ 

" r t 

T r 

1 

[ | x - X I J 

\-  L 

l Ek  [iij  2' 

x u ^ 

J = 

(A.  4) 


j =m+l 


where  R,  is  the  correlation  matrix  for  class  k. 
k 


j =m+l 


2,  Proof  of  Optimal  Properties  of  the  Eigenvectors  of 

It  is  desired  to  find  the  set  o: 
subject  to  the  normality  constraint 


It  is  desired  to  find  the  set  of  vectors  u . , u , u that  minimizes  (10) 

— m+l  — m+2  — n 


T „ 

u . u . = 1 

“J  “J 


j = m+1,  m+2,  . . . , n (A.  5) 
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1 


Let  ^1x1+2  * * • • * tJLn  Lagrange  multipliers.  A necessary  condition  for  the 

minimum  is 


n 


A ry,TG,.+  y,.<i-uTu.)i 

L L -j  i-j  £ j — j — j J 

j = m+l  j = m+1 

which  reduces  to  the  eigenvalue  equation 


= 0 k = m+l,  m+2, . . . , n (A.  6) 


G A • “A  = ° 


k = m+l,  m+2, . . . , n 


(A.  7) 


where  u^  are  eigenvectors  and  are  the  eigenvalues.  If  (A.  7)  is  used  in  (10)  then 
the  criterion  C.  becomes 

l 

n 

c,  <a-s> 

j =m+l 

A 

Therefore  to  minimize  C.,  one  must  choose  the  eigenvectors  of  G.  corresponding  to 
the  n - m smallest  eigenvalues. 
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